We used the following packages for our analysis
High dimensionality was the key to address, e.g. election data with it’s 471 columns. For this, our friends were filter, pivot_longer, group_by and summarise.
Using str_detect, we were able to track most of the branches/federations of the main parties and group them together
In the end we created three aggregated tidy data files to work with, based on survey data; election data and election turnout. The files all have standardized names and codes for parties, as well as election dates, which allow us to join them together when needed in our analysis
There were only 48 municipalities with over 100k people. 2 of the cities, Cádiz and Dos Hermanas had a varying population that was over 100k in some periods and under in others. We created a table with the winners and total % of vote, then plotted over time. PP have a stronghold in some areas.
ANY TEXT YOU WANT
<<<<<<< Updated upstreamThe original code was a loop that analyzed the second-party votes across all elections, breaking down the results by the size of the cities (in terms of population) where people voted.
# Step 1: Create population categories with ordered factors for PP
pp_first <- pp_first |>
mutate(
population_category = factor(
case_when(
population < 10000 ~ "<10.000",
population >= 10000 & population < 50000 ~ ">= 10.000 & < 50.000",
population >= 50000 & population < 100000 ~ ">= 50.000 & < 100.000",
population >= 100000 & population < 500000 ~ ">= 100.000 & < 500.000",
population >= 500000 & population < 1000000 ~ ">= 500.000 & < 1.000.000",
population >= 1000000 ~ ">= 1.000.000"
),
levels = c("<10.000", ">= 10.000 & < 50.000", ">= 50.000 & < 100.000",
">= 100.000 & < 500.000", ">= 500.000 & < 1.000.000", ">= 1.000.000")
)
)
# Step 2: Loop through elections and create a plot for each election for PP
unique_dates_pp <- unique(pp_first$date_elec)
plots <- list()
# Create a list to store plots for PP
plots_pp <- list()
for (date in unique_dates_pp) {
# Ensure `date` is treated as a valid Date object
current_date <- as.Date(date)
# Filter data for the specific election date
data_filtered <- pp_first |>
filter(date_elec == current_date) |>
group_by(population_category, second_party) |>
summarise(
total_votes = sum(second_votes, na.rm = TRUE),
.groups = "drop"
)
# Create the plot
plot <- ggplot(data_filtered, aes(x = population_category,
y = total_votes,
fill = second_party)) +
geom_bar(stat = "identity", position = "dodge", width = 0.7) +
scale_fill_manual(values = party_colors) +
scale_y_continuous(labels = scales::comma) +
labs(
title = paste("Second Party by Population for Election on", format(current_date, "%Y-%m-%d")), # Format the date properly
x = "Inhabitants per city",
y = "Total Votes"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
axis.text.y = element_text(size = 10),
legend.position = "bottom",
legend.title = element_blank() # Remove legend title
)
# Save the plot to the list
plots_pp[[as.character(current_date)]] <- plot
}
# Step 3: Display all plots for PP (one at a time)
for (date in unique_dates_pp) {
print(plots_pp[[as.character(as.Date(date))]])
}





Low Turnout Areas (<70.7%) PSOE and PP dominate low-turnout areas, securing the highest vote shares. Smaller parties like VOX, CS, and EH-BILDU have moderate presence, while regional parties and new entrants struggle to gain significant traction.
Extreme Low Turnout Areas (<45%) EAJ-PNV leads in extreme low-turnout areas, followed closely by PSOE and EH-BILDU. Traditional parties like PP still maintain influence, while smaller parties show minimal presence.
There is no significant overall correlation between population size (log-transformed) and the percentage of votes for parties. The red line indicates a stable trend with no noticeable changes.
Different parties show varying trends in vote percentages across population scales (log-transformed. PP performing better in lower population, while PSOE and PODEMOS-IU perform better in higher population.
Different parties show significant differences in support between rural and urban areas. EH-BILDU and BNG have higher support in rural areas, while MP, CS and PODEMOS-IU have stronger support in urban areas
Are polls more precise as we get closer to the election?
Are surveys conducted on a bigger sample more precise?
Which polling houses got it right the most and which ones deviated the most from the results?
Measurement criteria: weighted mean absolute error (WMAE) Weighting: 0.7 to the top five parties receiving the most votes in each general election, and a weight of 0.3 to the remaining parties.
final_election_summary_v2 <- final_election_summary |>
group_by(date_elec) |>
arrange(date_elec, desc(national_share)) |>
mutate(rank_pos = dense_rank(-national_share)) |>
mutate(weight = if_else(rank_pos <= 5, 0.7, 0.3)) |>
ungroup()
final_election_summary_v3 <- final_election_summary_v2 |>
mutate(national_share=national_share*100) |>
mutate(error = abs(votes - national_share))
wmae <- final_election_summary_v3 |>
group_by(pollster) |>
summarise(WMAE = sum(weight * error) / sum(weight)) |>
ungroup() |>
mutate(pollster = fct_reorder(pollster, WMAE, .desc = FALSE))[1] IMOP SOCIOMÉTRICA APPEND METRA SEIS VOX PÚBLICA
46 Levels: IMOP SOCIOMÉTRICA APPEND METRA SEIS VOX PÚBLICA ... NETQUEST
[1] DYM SIMPLE LÓGICA METROSCOPIA MYWORD NETQUEST
46 Levels: IMOP SOCIOMÉTRICA APPEND METRA SEIS VOX PÚBLICA ... NETQUEST
Trends in survey prediction biases over time, focusing on bias evolution and accuracy for each party.
<<<<<<< Updated upstreamHow has the turnout rate changed over time? And within each election year, how are turnout rates correlated with the municipalities’ populations?
<<<<<<< Updated upstreamWe can observe that there is no clear tendency of turnout rate. However, it is interesting seeing that these two ‘snap’ elections in 2016 and 2019 were held shortly after the previous elections, with a very brief gap between each one and its predecessor. And the voter turnout in both of these elections was significantly lower compared to the previous ones, suggesting that the nature of such ‘emergency’ elections may reduce citizens’ willingness to vote again.
=======We can observe that there is no clear tendency of turnout rate. However, it is interesting seeing that these two ‘snap’ elections in 2016 and 2019 were held shortly after the previous elections, with a very brief gap between each one and its predecessor. And the voter turnout in both of these elections was significantly lower compared to the previous ones, suggesting that the nature of such ‘emergency’ elections may reduce citizens’ willingness to vote again.
>>>>>>> Stashed changesAnd apart from the election year and specific election context, how are turnout rates correlated with the municipalities’ populations within each election year?
<<<<<<< Updated upstreamSupport for smaller parties increased significantly over time, peaking across most communities in April 2019.